library(fpp3)05_1_A_Benchmark_Methods_FittedVals
References
- Hyndman, R.J., & Athanasopoulos, G., 2021. Forecasting: principles and practice. 3rd edition.
- Fable package documentation
Libraries
Introduction
In this session we are going to:
- Explore some benchmark models. These are models that are simple in nature, but yet will serve as a measure of performance to which we will compare more complex models.
- If more complex models do not outperform these benchmarks, their complexity will not be justified.
- Introduce the concept of fitted values. and compare it to forecasts.
- Practice the basic steps behind fitting models and forecasting with the library
fable. - Do some exercises.
For most of the simple examples in this notebook we will work with this time series studying the production of bricks between 1970 and 2004:
bricks <- aus_production %>%
filter_index("1970 Q1" ~ "2004 Q4") %>%
select(Bricks)
head(bricks)# A tsibble: 6 x 2 [1Q]
Bricks Quarter
<dbl> <qtr>
1 386 1970 Q1
2 428 1970 Q2
3 434 1970 Q3
4 417 1970 Q4
5 385 1971 Q1
6 433 1971 Q2
Fitted values vs forecasts
Before using a model, we will need to fit that model (classic math terminology). This process is also known as training the model (machine learning terminology). This means that we will use data to produce estimates of the parameters of the model. We will call this data training data.
Note a few important technical details:
- I talk about estimates of the parameters because we are estimating these parameters from a particular sample or in our case, from a particular realisation of the time series.
- Notation:
- \(\beta\) - true value of the parameter.
- \(\hat{\beta}\) - estimate of the parameter.
- Many times we will abuse this notation and write \(\beta\) instead of \(\hat{\beta}\), but know that, when fitting a model, we are producing estimates of the parameters.
- Notation:
- If we fitted the model to another sample or to another realization of the time series (think of this as repeating an experiment), we would produce different estimates of those parameters.
- Being estimates, those parameters have associated standard errors.
Fitted values - example with linear regression
Now let us resort to your knowledge of linear regression to ilustrate the concept of fitted values. When fitting a linear regression model to predict a variable \(y\) based on the values of a predictor \(x\), we take a sample of data (our training data) and produce estimates of the model parameters.
Once these estimates \(\hat{\beta_0}\) and \(\hat{\beta_1}\) have been produced, we can compute the values of \(y\) predicted by the model for the specific values of \(x\) of the training data. The values of \(y\) returned by the model would be the fitted values \(\hat{y_t}\) of the linear regression model.
At each point there is a difference between the actual value of the training data \(y\) and the fitted value \(\hat{y_t}\). This is what we call the random error. The word error here should not be understood as mistake, but rather as anything that may affect \(y_t\) other than the linear effec of \(x_t\)
\[ y_t - \hat{y_t} = \varepsilon_t \]
The image below illustrates this:
Time Series case - Fitted values and forecasts
The following figure will serve to clarify the difference between the concepts of fitted values and forecasts. This figure summarizes the forecasting cast and is very relevant for both this and the second part of the course:
In this figure:
- The black dots represent the available observations at the time of forecasting \(y_t\). These extend fron \(t=1\) to \(t=T\)
- The blue dots represent the fitted values, values returned by the model we have fitted in the training region producing estimates of the model parameters.
- These fitted values are defined as one step ahead forecasts on the training data: \(y_{t|t-1}\). More on this later.
- The observations minus the fitted values are the residuals of the model.
- The green dots represent the forecasts beyond the available data, at a horizon h: \(y_{T+h|T}\).
- These forecasts are produced by the model using the data in the training region from \(t=1\) to \(t=T\).
- The forecasts extend betond the training region. Hence \(y_{T+h|T}\) meaning: forecast at \(t=T+h\) given all the information up to \(t=T\).
- The orange dots represent future observations not available at the time of forecasting. Once these are available, we could compute the forecast errors.
More on these concepts later on in the notebook
Forecasts - \(\hat{y}_{T+h|T}\)
When fitting a time series model to a given realization of a time series - that is, to our training data - the goal is to produce estimates of the model parameters. Once the model is fitted (once we have those estiamtes), we may produce forecasts beyond the training data. Let us assume that:
- The training data starts at \(t=1\) and ends at \(t=T\).
- The forecasts are produced for a horizon \(h\).
The notation for the forecast at \(T+h\) given everything that has occurred up to time \(T\) (end of the training data) is:
\[ \begin{align*} \text{Forecast at time T+h based on what has happened up to time T:} && \hat{y}_{T+h|T} \end{align*} \]
Fitted values - \(\hat{y}_{t|t-1}\)
Fitted values are one-step in-sample (i.e., in the time region of the training data) “forecasts” (we will see some exceptions to this definition):
- one-step ahead means that the forecast only extend one time step into the future (\(h=1\)).
- To produce the one-step ahead forecast at time \(t\), only information up to \(t-1\) is considered.
- in-sample means that the forecasts belong to the time-region of the training data (training data = the time series to which we fit the model).
\[ \begin{align*} \text{Fitted value at time t based on what has happened up to time t-1:} && \hat{y}_{t|t-1} \end{align*} \]
Given their definition as one-step ahead forecasts, we denote the fitted value at time \(t\) as \(\hat{y}_{t|t-1}\), meaning the forecast is based on observations up to \(t-1\), i.e. {\(y_{1},\dots,y_{t-1}\)}.
We will abuse notation and sometimes refer to the fitted values simply as \(\hat{y_t}\) instead of \(\hat{y}_{t|t-1}\).
In later sessions we will see that, for many models, the equation for the fitted values will be obtained from the equation for a forecast at a horizon \(h\) as follows:
\[ \begin{align*} \text{Forecast at a horizon h given information up to t=T}: && \hat{y}_{T+h|T} \\ \text{Particularize at h = 1}: && \hat{y}_{T+1|T} \\ \text{Particularize equation at t instead of T+1}: && \hat{y}_{t|t-1} \end{align*} \]
The last change in the equation (particularize at \(t\) instead of \(T+1\)) is simply substituting \(T\) by \(t-1\). That is, we are using the change of variable \(t=T+1 \rightarrow T=t-1\). Hence:
\[ \begin{equation} \hat{y}_{T+1|T} \underset{\substack{\uparrow \\ \text{C.V:} \\ T = t-1}}{=} \hat{y}_{t-1+1|t-1} = \hat{y}_{t|t-1} \end{equation} \]
The graph below shows an example of all these concepts:
- The black line represents the
brickstime series. - The dashed red line represents the fitted values resulting from fitting the
Naivemodel to this time series. As we will see later, these are one-step ahead forecasts. - The blue line represents the forecasts given by the
Naivemodel fitted to the data. These extend several time steps into the future.- Confidence regions of 80 and 95% are also depicted.
Forecasts compared to fitted values - summary
The table below summarizes the comparison between the concepts of fitted values vs. forecasts that we explained in the foregoing section:
| Property | Forecasts - \(\hat{y}_{T+h|T}\) | Fitted values - \(\hat{y}_{t|t-1}\) |
|---|---|---|
| Definition | Forecast at \(T+h\) given everything that occurred up to \(T\) | Fitted value at \(t\) given everything that occurred up to \(t-1\) |
| Range | Extend beyond the range of time of the training data (beyond \(T\)) | Limited to the time range corresponding to the training data (\(t \leq T\)) |
| Steps | Can be multi-step, that is, for a horizon \(h>1\) | Are one-step ahead (\(h=1\)) |
Importantly sometimes fitted values will not be strictly one-step ahead forecasts (for example in the Mean model). When we encounter examples of this, I will mention it explicitly. But the definition holds for the general case.
Point Forecast and Forecast Distribution
To take a statistical approach to the forecasting task and be able to account for the uncertainty, it is best to cosider a time series as a collection of random variables indexed over time. A collection of random variables indexed over time is a type of stochastic process (there are many possible structures for a stochastic process, time series is just one of them).
- Past values of the time series {\({y_1, y_2, \dots, y_T}\)} are random variables that have been realized, that is, they have taken a specific outcome, a number, they have collapsed to a single value.
- Future values of the time series, the forecasts {\(\hat{y}_{T+h|T}\)}, are random variables that have yet to be realized.
Let us look at the following example of the international arrivals to australia, in which a model has been fitted and some forecasts have been produced:
If we focus on the specific forecast region, each point \(T+h\) in the future has an associated forecast distribution:
More technically, at each point \(T+h\) in the future corresponding to the forecast region we have:
- \(y_{T+h|T}\), the random variable at \(T+h\) given the information available up to time \(T\)
- A forecast distribution associated to the random variable \(y_{T+h|T}\) that describes the probability of occurrence of specific values.
- A point forecast \(\hat{y}_{T+h|T}\) that is the mean of the forecast distribution (the median is also used sometimes).
The most important concepts signaled here are:
- Forecast distribution as the probability density function describing the probabilities associated to the set of values that the forecast \(y_{T+h|T}\) may take.
- Point forecast \(\hat{y}_{T+h|T}\) as the mean value of the forecast distribution.
As a side note, a collection of random variables {\(y_t\)} indexed by \(t\) is a type of stochastic process.
Residuals
The residuals are the difference between the observations and the corresponding fitted values. At a specific point \(t\), the value of the residuals is:
\[ e_{t} = y_{t}-\hat{y}_{t}. \]
Innovation residuals
These are the residuals in the transformed domain, only relevant if a transformation has been used. If no transformation has been used, residuals and innovation residuals will match.
Let us look as an example. If we use a logarithmic transforamtion on the data:
- Original data: \(y_t\)
- Transformed data: \(w_t = log(y_t)\)
We would fit the model in the transformed domain, to our transformed data, hence generating
- Fitted values in the transformed domain: \(\hat{w}_{t|t-1} \underset{\underset{\text{Notation}}{\uparrow}}{=} \hat{w_t}\)
- Residuals in the transformed domain: \(w_t - \hat{w_t}\)
By reversing the transformation (backtransforming) we would obtain:
- Fitted values in the transformed domain: \(\hat{y_t} = e^{\hat{w}_t}\)
- Residuals in the transformed domain: \(y_t - \hat{y_t}\)
We will see later in this notebook that, in the fpp3 library, we can obtain both the innovation residuals and the residuals using the funciton augment()-
Mean model
Forecasts of all future values are equal to the average of the historical data.
\[ \hat{y}_{T+h|T} = \bar{y} = (y_1 + y_2 + y_3 +\dots + y_{T})/T. \]
Defining and training the model (generating the fitted values or estimates)
fit <-
bricks %>%
model(
mean = MEAN(Bricks)
)
fit# A mable: 1 x 1
mean
<model>
1 <MEAN>
This particular mable (model table) has 1 row (1 time series fitted) and 1 column (1 model)
Fitted values
The function augment() allows us to extract some relevant information contained within the model objects generated by the fpp3 library. Among other things, fitted values and residuals. Note that the model also contains the original time series (column Bricks).
fitted_vals <-
fit %>%
augment()
head(fitted_vals)# A tsibble: 6 x 6 [1Q]
# Key: .model [1]
.model Quarter Bricks .fitted .resid .innov
<chr> <qtr> <dbl> <dbl> <dbl> <dbl>
1 mean 1970 Q1 386 451. -64.9 -64.9
2 mean 1970 Q2 428 451. -22.9 -22.9
3 mean 1970 Q3 434 451. -16.9 -16.9
4 mean 1970 Q4 417 451. -33.9 -33.9
5 mean 1971 Q1 385 451. -65.9 -65.9
6 mean 1971 Q2 433 451. -17.9 -17.9
The mean model is a good example in which fitted values are not one-step ahead forecasts on the training data, because it actually uses data beyond the point at which the fitted value is computed to produce the fitted values.
- That is, the fitted value at every point is computed using all the available data at \(t=1, 2, 3, \cdots, T\) and computing the mean \((y_1 + y_2 + y_3 +\dots + y_{T})/T\).
- As an example, the fitted value at \(t=3\) (\(\hat{y_{3|2}}\)) is not computed solely using \(y_1\) and \(y_2\) as the strict definition says, but rather all the information from \(t=1, 2, 3, \cdots, T\). That is way the mean model is an exception in terms of the definition of fitted values.
fitted_vals %>%
filter(.model == "mean") %>%
autoplot(Bricks, colour = "gray") +
geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")Forecasts
To produce forecasts in the fable library we use the forecast() function on the fitted model. We specify an argument: h, the number of time steps we wish to forecast.
# produce forecasts for 8 timesteps
forecasts <- fit %>% forecast(h = 8)
forecasts# A fable: 8 x 4 [1Q]
# Key: .model [1]
.model Quarter Bricks .mean
<chr> <qtr> <dist> <dbl>
1 mean 2005 Q1 N(451, 4022) 451.
2 mean 2005 Q2 N(451, 4022) 451.
3 mean 2005 Q3 N(451, 4022) 451.
4 mean 2005 Q4 N(451, 4022) 451.
5 mean 2006 Q1 N(451, 4022) 451.
6 mean 2006 Q2 N(451, 4022) 451.
7 mean 2006 Q3 N(451, 4022) 451.
8 mean 2006 Q4 N(451, 4022) 451.
The output is a so called fable (forecast table). Some comments about the result:
- The column
.meanrepresents the forecast for each specific horizon \(h=1, 2, \dots\). That is, the point forecast \(\hat{y}_{T+h|T}\). The notation.meanis used because it refers to the mean of the forecast distribution as was explained earlier in this session. - The column
Bricks(or whatever the variable name is) in the resulting fable contains a distribution object detailing the forecast distribution at each forecast horizon.- For the case at hand we can see that hese are normal distributions with \(\mu =451\) and \(\sigma^2 = 4022\).
- We will see how to handle these distribution objects later in the subject.
Depicting forecasts along with the time series
Important: to depict the forecasts along with the time series use autoplot on the fable object (in our case the variable forecasts) and then specify the original tsibble object within autoplot (in this case bricks). This will result in the forecasts being depicted along the original time series:
# To depict the forecasts and the original series use autoplot() with the
forecasts %>%
# Depicts the original time series and the forecasts
autoplot(bricks, level = FALSE)Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
If you now wish to add the fitted values, you may use geom_line() to add another layer to the graph:
# To depict the forecasts and the original series use autoplot() with the
forecasts %>%
# Depicts the original time series and the forecasts
autoplot(bricks, level = FALSE) +
# Overlays the fitted values
geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Finally, if you want to include the confidence intervals remove the statement level=FALSE
# To depict the forecasts and the original series use autoplot() with the
forecasts %>%
# Depicts the original time series and the forecasts
autoplot(bricks) +
# Overlays the fitted values
geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")Estimates of the model parameters
Using the function tidy() on a fitted model, we can get the estimates of the parameters of the model. In this case there is only one parameter, the mean of the sample.
tidy(fit)# A tibble: 1 × 6
.model term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 mean mean 451. 5.34 84.4 2.58e-121
Naïve model
The Naïve model sets all forecasts to be the value of the last available observation (observation at \(t=T\))
\[ \hat{y}_{T+h|T}=y_T \]
Let us use the aus_exports dataset to illustrate this with an example:
aus_exports <- filter(global_economy, Country == "Australia")
autoplot(aus_exports, Exports)Defining and training the model (generating the fitted values or estimates)
fit <- aus_exports %>% model(Naive = NAIVE(Exports))
fit# A mable: 1 x 2
# Key: Country [1]
Country Naive
<fct> <model>
1 Australia <NAIVE>
Again the mable has 1 row (we are only fitting the model to 1 time series) and one column for the model (we only fitted one model).
Fitted values
The naïve model produces the fitted values by calculating a one step ahead forecast using the previous observations.
By setting \(h=1\) on the forecasts of the Naive model, particularized at \(t+1\) instead of \(T+1\) to remain in the training region, we obtain the equation for the fitted values:
\[ \hat{y}_{t+1|t}=y_t \]
This gives us the fitted value at \(t+1\). We want the fitted value at \(t\), so we simply shift that equation one time-step to the past accordingly. Wherever we have \(t\), we write \(t-1\) and simplify, obtaining:
\[ \hat{y}_{t|t-1}=y_{t-1} \]
This is the quation for the fitted value at \(t\) in the case of the Naïve model. The fitted value at \(t\) is simply the observation at \(t-1\). Note that, as a result, there will be no fitted value at \(t=1\) for the Naïve model, since there is no previous value.
Let us depict the fitted values for our example:
# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) # A tsibble: 6 x 7 [1Y]
# Key: Country, .model [1]
Country .model Year Exports .fitted .resid .innov
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Australia Naive 1960 13.0 NA NA NA
2 Australia Naive 1961 12.4 13.0 -0.591 -0.591
3 Australia Naive 1962 13.9 12.4 1.54 1.54
4 Australia Naive 1963 13.0 13.9 -0.937 -0.937
5 Australia Naive 1964 14.9 13.0 1.93 1.93
6 Australia Naive 1965 13.2 14.9 -1.72 -1.72
# Print fitted values along with the original series
fitted_vals %>%
filter(.model == "Naive") %>%
autoplot(Exports, colour = "gray") +
geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")Warning: Removed 1 row containing missing values (`geom_line()`).
The above graph looks very similar to the first lag of the time series! In fact, that is exactly what it is.
Forecasts
# produce forecasts for 8 timesteps
forecasts <- fit %>% forecast(h = 8)
forecasts# A fable: 8 x 5 [1Y]
# Key: Country, .model [1]
Country .model Year Exports .mean
<fct> <chr> <dbl> <dist> <dbl>
1 Australia Naive 2018 N(21, 1.5) 21.3
2 Australia Naive 2019 N(21, 3.1) 21.3
3 Australia Naive 2020 N(21, 4.6) 21.3
4 Australia Naive 2021 N(21, 6.1) 21.3
5 Australia Naive 2022 N(21, 7.6) 21.3
6 Australia Naive 2023 N(21, 9.2) 21.3
7 Australia Naive 2024 N(21, 11) 21.3
8 Australia Naive 2025 N(21, 12) 21.3
# Depict the forecasts
forecasts %>%
autoplot(aus_exports, level = FALSE) +
# Overlays the fitted values
geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).
As previously explained, unlike the fitted values, the forecasts can extend multiple steps ahead into the future.
Estimates of the model parameters
tidy(fit)# A tibble: 0 × 7
# ℹ 7 variables: Country <fct>, .model <chr>, term <chr>, estimate <dbl>,
# std.error <dbl>, statistic <dbl>, p.value <dbl>
As we can see, the Naïve model does not have any parameter. It just forecasts by extending the last observation and does not need to estimate any parameter.
Seasonal naïve
Each forecast equal to the last observed value from the same season one season ago.
- For example: the same value of the series the same month of the previous year.
Formally, the forecast at time T+h is
\[ \hat{y}_{T+h|T} = y_{T+h-m(k+1)} \]
where:
- \(m\) is the seasonal period (length in timesteps of one season)
- \(k\) is the integer part of \((h-1)/m\) (i.e, the number of complete years in the forecast period prior to time T + h)
- Example: for monthly data (\(m=12\))
- \(1 \leq h \leq 12\) (forecasts of up to one season ahead)
- \(0 \leq h-1/m < 1 \rightarrow k = int(\frac{h-1}{m}) = 0\).
- Since \(k=0 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(0+1)}=y_{T+h-m}\)
- i.e.: \(\hat{y}_{T+h|T}=y_{T+h-m}\): the forecast is equal to the matching observation of the last available season in the historical data.
- \(13 \leq h \leq 24\) (forecasts of up to two sessions ahead)
- \(1 \leq h-1/m < 2 \rightarrow k = int(\frac{h-1}{m}) = 1\).
- Since \(k=1 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(1+1)}=y_{T+h-2m}\)
- i.e.: \(\hat{y}_{T+h|T}=y_{T+h-2m}\): the forecast is equal to the matching observation of the last available season in the historical data.
- \(25 \leq h \leq 36\) (forecasts of up to three sessions ahead)
- \(2 \leq h-1/m < 3 \rightarrow k = int(\frac{h-1}{m}) = 2\).
- Since \(k=2 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(2+1)}=y_{T+h-3m}\)
- i.e.: \(\hat{y}_{T+h|T}=y_{T+h-3m}\): the forecast is equal to the matching observation of the last available season in the historical data.
- \(37 \leq h \leq 48\) (forecasts of up to four sessions ahead)
- …
- \(1 \leq h \leq 12\) (forecasts of up to one season ahead)
- Example: for monthly data (\(m=12\))
The interpretation of the formula is therefore quite simple. In the case of monthly data:
- all future February values will be equal to the last observed February value.
- all future March values will be equal to the last observed February value.
- …
This will be better understood with the examples below.
Let us do an example with the us_employment series:
employed <- filter(us_employment, Title == "Total Private", Month >= yearmonth("01-01-2010"))
head(employed)# A tsibble: 6 x 4 [1M]
# Key: Series_ID [1]
Month Series_ID Title Employed
<mth> <chr> <chr> <dbl>
1 2010 Jan CEU0500000001 Total Private 105444
2 2010 Feb CEU0500000001 Total Private 105490
3 2010 Mar CEU0500000001 Total Private 106176
4 2010 Apr CEU0500000001 Total Private 107220
5 2010 May CEU0500000001 Total Private 107931
6 2010 Jun CEU0500000001 Total Private 108710
autoplot(employed)Plot variable not specified, automatically selected `.vars = Employed`
Defining and training the model
fit <- employed %>% model(SNaive = SNAIVE(Employed))
fit# A mable: 1 x 2
# Key: Series_ID [1]
Series_ID SNaive
<chr> <model>
1 CEU0500000001 <SNAIVE>
NOTE: we may use the special function lag to specify the seasonal period the method is to consider. Most of the time this is not necessary, but in case there are multiple possible seasonal periods, you may want to specify which specifically you want SNAIVE() to consider.
# In this case it is not necessary to specify lag
fit <- employed %>% model(SNaive = SNAIVE(Employed ~ lag("year")))
fit# A mable: 1 x 2
# Key: Series_ID [1]
Series_ID SNaive
<chr> <model>
1 CEU0500000001 <SNAIVE>
Fitted values
Because the method generates forecasts by using the corresponding observation from the previous season, the model does not generate fitted values for the first season (remember fitted values are one-step ahead forecasts):
# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) # A tsibble: 6 x 7 [1M]
# Key: Series_ID, .model [1]
Series_ID .model Month Employed .fitted .resid .innov
<chr> <chr> <mth> <dbl> <dbl> <dbl> <dbl>
1 CEU0500000001 SNaive 2010 Jan 105444 NA NA NA
2 CEU0500000001 SNaive 2010 Feb 105490 NA NA NA
3 CEU0500000001 SNaive 2010 Mar 106176 NA NA NA
4 CEU0500000001 SNaive 2010 Apr 107220 NA NA NA
5 CEU0500000001 SNaive 2010 May 107931 NA NA NA
6 CEU0500000001 SNaive 2010 Jun 108710 NA NA NA
# Print fitted values along with the original series
fitted_vals %>%
filter(.model == "SNaive") %>%
autoplot(Employed, colour = "gray") +
geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")Warning: Removed 12 rows containing missing values (`geom_line()`).
Forecasts
# produce forecasts for 36 timesteps
forecasts <- fit %>% forecast(h = 36)
forecasts# A fable: 36 x 5 [1M]
# Key: Series_ID, .model [1]
Series_ID .model Month Employed .mean
<chr> <chr> <mth> <dist> <dbl>
1 CEU0500000001 SNaive 2019 Oct N(128001, 5538565) 128001
2 CEU0500000001 SNaive 2019 Nov N(128415, 5538565) 128415
3 CEU0500000001 SNaive 2019 Dec N(128363, 5538565) 128363
4 CEU0500000001 SNaive 2020 Jan N(125932, 5538565) 125932
5 CEU0500000001 SNaive 2020 Feb N(126370, 5538565) 126370
6 CEU0500000001 SNaive 2020 Mar N(126994, 5538565) 126994
7 CEU0500000001 SNaive 2020 Apr N(128007, 5538565) 128007
8 CEU0500000001 SNaive 2020 May N(128771, 5538565) 128771
9 CEU0500000001 SNaive 2020 Jun N(129800, 5538565) 129800
10 CEU0500000001 SNaive 2020 Jul N(129883, 5538565) 129883
# ℹ 26 more rows
# Depict the forecasts
forecasts %>%
autoplot(employed, level = FALSE) +
autolayer(fitted_vals, .fitted, colour = "blue", linetype = "dashed")Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 12 rows containing missing values (`geom_line()`).
Estimates of the model parameters
tidy(fit)# A tibble: 0 × 7
# ℹ 7 variables: Series_ID <chr>, .model <chr>, term <chr>, estimate <dbl>,
# std.error <dbl>, statistic <dbl>, p.value <dbl>
As we can see, the seasonal naïve model, just like the naïve model, does not have any parameters
Drift method
The drift method allows forecasts to increase or decrease over time, with the forecasts increasing a specific constant called drift for every unit time.
- The drift is the amount of change per unit time, set to be the average change in the historical data.
That average change in the historical data will in the end be the slope of the line joining the first and last point in the training data. The figures below explain this in more detail:
The drift method extrapolates this average change into the future to compute \(\hat{y}_{T+h|T}\) simply multiplying this average change by \(h\). In other words, it creates a straight line covering the entire forecast horizon with a slope equal to the average change. Mathematically, it works as follows:
If \(\Delta y_t = y_t - y_{t-1}\) then the equation for the drift method may be written as:
\[ \begin{equation} \hat{y}_{T+h|T} = y_{T} + h \underbrace{\frac{\sum_{t=2}^{t=T}\Delta y_t}{T-1}}_{\substack{\text{average change of } y \\ \text{in the historical data}}} = y_{T} + h \ \frac{\sum_{t=2}^{t=T} (y_{t}-y_{t-1})}{{T-1}} = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right) \end{equation} \]
Defining and training the model
- NOTE: RW stands for Random Walk. More about this later in the subject, in the context of ARIMA models.
fit <- employed %>% model(Drift = RW(Employed ~ drift()))
fit# A mable: 1 x 2
# Key: Series_ID [1]
Series_ID Drift
<chr> <model>
1 CEU0500000001 <RW w/ drift>
Equivalent to drawing a line between the first and last observations and extrapolating into the future (the final graph of the example clarifies this further).
Fitted values
# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) # A tsibble: 6 x 7 [1M]
# Key: Series_ID, .model [1]
Series_ID .model Month Employed .fitted .resid .innov
<chr> <chr> <mth> <dbl> <dbl> <dbl> <dbl>
1 CEU0500000001 Drift 2010 Jan 105444 NA NA NA
2 CEU0500000001 Drift 2010 Feb 105490 105650. -160. -160.
3 CEU0500000001 Drift 2010 Mar 106176 105696. 480. 480.
4 CEU0500000001 Drift 2010 Apr 107220 106382. 838. 838.
5 CEU0500000001 Drift 2010 May 107931 107426. 505. 505.
6 CEU0500000001 Drift 2010 Jun 108710 108137. 573. 573.
# Print fitted values along with the original series
fitted_vals %>%
filter(.model == "Drift") %>%
autoplot(Employed, colour = "gray") +
geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")Warning: Removed 1 row containing missing values (`geom_line()`).
The fitted values look very similar to the SNAIVE fitted values. But that is only because they are one step-ahead forecasts. Note that values for the first observations are produced (unlike for the seasoanl naïve) case.
Forecasts
forecasts <- fit %>% forecast(h=8)
forecasts# A fable: 8 x 5 [1M]
# Key: Series_ID, .model [1]
Series_ID .model Month Employed .mean
<chr> <chr> <mth> <dist> <dbl>
1 CEU0500000001 Drift 2019 Oct N(129518, 764786) 129518.
2 CEU0500000001 Drift 2019 Nov N(129724, 1542645) 129724.
3 CEU0500000001 Drift 2019 Dec N(129929, 2333578) 129929.
4 CEU0500000001 Drift 2020 Jan N(130135, 3137584) 130135.
5 CEU0500000001 Drift 2020 Feb N(130341, 4e+06) 130341.
6 CEU0500000001 Drift 2020 Mar N(130547, 4784816) 130547.
7 CEU0500000001 Drift 2020 Apr N(130752, 5628041) 130752.
8 CEU0500000001 Drift 2020 May N(130958, 6484340) 130958.
# Extract the initial and final points of the series
# to depict the average change slope.
n_rows = nrow(employed)
drift_points <- tibble(Month = c(employed$Month[1], employed$Month[n_rows]),
Employed = c(employed$Employed[1], employed$Employed[n_rows])) %>%
as_tsibble()Using `Month` as index variable.
# Depict the forecasts
forecasts %>%
autoplot(employed, level = FALSE) +
autolayer(fitted_vals, .fitted, colour = "blue", linetype = "dashed") +
geom_line(drift_points, mapping = aes(x = Month, y = Employed), color = "red", linetype = "dashed")Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).
In the above graph we can see that:
- The forecasts are generated following the slope of the line that connects the initial and final points of the series (red slope). A further example is depicted in the image below:
- The fitted values are very similar to those of the Naïve Method. Nonetheless, they are not the same. In this case each fitted value is generated by a one-step forecast using the drift method. That is, following the slope resulting from connecting each point of the series with the initial point and extending it one step into the future.
Estimates of the model parameters
We can look at the number of parameters in a model and their values using the function tidy(). If we apply it to the drift model we get:
tidy(fit)# A tibble: 1 × 7
Series_ID .model term estimate std.error statistic p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 CEU0500000001 Drift b 206. 80.8 2.54 0.0123
The drift model has only one parameter, the slope of the line joining the first and the last observation.
Forecasting with decomposition
If an appropriate decomposition algorithm for the time series has been found, then in many instances it is of insterest to build two separate models:
- A model for the seasonally adjusted component
- A model for the seasonal component.
Afterwards the forecasts of both models may be combined together to yield the final forecast.
Recall that, if the seasonal component is \(\hat{S_t}\), the seasonally adjusted component \(\hat{A_t}\) of the time series is what results from removing the effect of seasonality from the time series.
The seasonallly adjusted component is computed differently depending on whether the scheme is additive or multiplicative:
\[ \begin{align*} \text{For additive schemes} && \hat{A_t} = y_t - \hat{S_t} \\ \text{For multiplicative schemes} && \hat{A_t} = \frac{y_t}{\hat{S_t}} \\ \end{align*} \]
The decomposed series can be written as:
- \(y_t = \hat{S}_t + \hat{A}_t\) for additive schemes
- where \(\hat{A}_t = \hat{T}_t+\hat{R}_{t}\) is the seasonally adjusted component.
- \(y_t = \hat{S}_t\hat{A}_t\) for multiplicative schemes
- where \(\hat{A}_t = \hat{T}_t\hat{R}_{t}\) is the seasonally adjusted component.
To forecast a decomposed time series \(\hat{S_t}\) and \(\hat{A_t}\) are forecast separately:
- \(\hat{A_t}\): any non-seasonal forecasting method may be used (drift method, Holt’s method, ARIMA)…
- \(\hat{S_t}\): usually assumed to remain unchanged, or to change very slowly. Many times it is forecast by taking the observations of the last season, that is, using simply a seasonal naïve model.
Fitting the model
Let us again resort to the employed time series:
employed <- filter(us_employment, Title == "Total Private", Month >= yearmonth("01-01-2010"))
autoplot(employed)Plot variable not specified, automatically selected `.vars = Employed`
To fully understand the model specification, let us first look at the outcome of an STL decomposition applied to this series
employed %>%
model(
stl = STL(Employed)
) %>%
components()# A dable: 117 x 8 [1M]
# Key: Series_ID, .model [1]
# : Employed = trend + season_year + remainder
Series_ID .model Month Employed trend season_year remainder season_adjust
<chr> <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
1 CEU05000… stl 2010 Jan 105444 1.07e5 -2016. 272. 107460.
2 CEU05000… stl 2010 Feb 105490 1.07e5 -1816. -8.49 107306.
3 CEU05000… stl 2010 Mar 106176 1.07e5 -1231. -33.0 107407.
4 CEU05000… stl 2010 Apr 107220 1.08e5 -361. 14.7 107581.
5 CEU05000… stl 2010 May 107931 1.08e5 305. -69.4 107626.
6 CEU05000… stl 2010 Jun 108710 1.08e5 1006. -121. 107704.
7 CEU05000… stl 2010 Jul 108784 1.08e5 916. -85.9 107868.
8 CEU05000… stl 2010 Aug 108936 1.08e5 886. -36.8 108050.
9 CEU05000… stl 2010 Sep 108568 1.08e5 357. -9.10 108211.
10 CEU05000… stl 2010 Oct 108984 1.08e5 648. -17.2 108336.
# ℹ 107 more rows
The seasonal component is stored in a column named season_year and the seasonally adjusted component is stored in a column named season_adjust. To give a full specification to decomposition_model() we need to:
- Specify a decomposition scheme
- Provide a model for the seasonally adjusted component.
- Provide a model for the seasonal component (by default the SNAIVE component is used, so this is not strictly necessary).
fit <- employed %>%
model(
decomp = decomposition_model(
# Specify the decomposition scheme to be used.
STL(Employed),
# Specify a model for the seasonally adjusted component (in this case, a drift).
RW(season_adjust ~ drift()),
# Specify a model for the seasonal component (unnecesary, since SNAIVE is the default).
SNAIVE(season_year)
)
)
fit# A mable: 1 x 2
# Key: Series_ID [1]
Series_ID decomp
<chr> <model>
1 CEU0500000001 <STL decomposition model>
Fitted values
As usual, you can extract the fitted values using augment():
fit %>% augment()# A tsibble: 117 x 7 [1M]
# Key: Series_ID, .model [1]
Series_ID .model Month Employed .fitted .resid .innov
<chr> <chr> <mth> <dbl> <dbl> <dbl> <dbl>
1 CEU0500000001 decomp 2010 Jan 105444 NA NA NA
2 CEU0500000001 decomp 2010 Feb 105490 NA NA NA
3 CEU0500000001 decomp 2010 Mar 106176 NA NA NA
4 CEU0500000001 decomp 2010 Apr 107220 NA NA NA
5 CEU0500000001 decomp 2010 May 107931 NA NA NA
6 CEU0500000001 decomp 2010 Jun 108710 NA NA NA
7 CEU0500000001 decomp 2010 Jul 108784 NA NA NA
8 CEU0500000001 decomp 2010 Aug 108936 NA NA NA
9 CEU0500000001 decomp 2010 Sep 108568 NA NA NA
10 CEU0500000001 decomp 2010 Oct 108984 NA NA NA
# ℹ 107 more rows
QUESTION: Why are the first rows of the fitted values NAs??
Forecasts
fit %>%
forecast() %>%
autoplot(employed, level = FALSE) +
labs(y = "Number of people",
title = "US retail employment")Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
fit# A mable: 1 x 2
# Key: Series_ID [1]
Series_ID decomp
<chr> <model>
1 CEU0500000001 <STL decomposition model>
The graph above shows that the model combines the SNAIVE + the DRIFT forecasts (this is equivalent to adding the slope of the drift model to the seasonal naive model).
Estimates of the model parameters
When using the function tidy() on the decomposition model, we get the following error:
no applicable method for 'tidy' applied to an object of class "c('decomposition_model', 'model_combination')
tidy(fit)Error in `mutate()`:
ℹ In argument: `dplyr::across(all_of(mbl_vars), function(x) lapply(x,
tidy, ...))`.
Caused by error in `across()`:
! Can't compute column `decomp`.
Caused by error in `UseMethod()`:
! no applicable method for 'tidy' applied to an object of class "c('decomposition_model', 'model_combination')"
What this means is that decomposition_model is actually the combination of two models and the function tidy() is not programmed to deal with this. To get estimates of the model parameters, we should perform the decomposition of the model as defined and then fit the models separately to the seasonally adjusted component and the seasonal component. Inspecting each of those objects would yield the model estimates.
# Create decompositio model
dcmp <-
employed %>%
model(
stl = STL(Employed)
) %>%
components()
# Model for the seasonally adjusted data.
# Note that season_adjust is a column within dcmp
fit_seasonadj <-
dcmp %>%
select(season_adjust) %>%
model(
drift = RW(season_adjust ~ drift())
)
# Model for the seasonal data.
# Note that season_year is a column within dcmp
fit_seasonal <-
dcmp %>%
select(season_year) %>%
model(
snaive = SNAIVE(season_year)
)
# Parameters of drift model
tidy(fit_seasonadj)# A tibble: 1 × 6
.model term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 drift b 186. 8.66 21.5 4.48e-42
# Estimates of parameters for snaive model
tidy(fit_seasonal)# A tibble: 0 × 6
# ℹ 6 variables: .model <chr>, term <chr>, estimate <dbl>, std.error <dbl>,
# statistic <dbl>, p.value <dbl>
Exercise 1
For this exercise we are going to use the aus_retail dataset. Remember you can always use ?aus_retail in the console to red the details abut the dataset.
The dataset contains a variety of series that that might be filtered using their specific series ID. We will be working the the following Series ID: A3349767W. The following code filters the series for you.
retail_series <- aus_retail %>%
filter(`Series ID` == "A3349767W")
head(retail_series)# A tsibble: 6 x 5 [1M]
# Key: State, Industry [1]
State Industry `Series ID` Month Turnover
<chr> <chr> <chr> <mth> <dbl>
1 Northern Territory Clothing, footwear and perso… A3349767W 1988 Apr 2.3
2 Northern Territory Clothing, footwear and perso… A3349767W 1988 May 2.9
3 Northern Territory Clothing, footwear and perso… A3349767W 1988 Jun 2.6
4 Northern Territory Clothing, footwear and perso… A3349767W 1988 Jul 2.8
5 Northern Territory Clothing, footwear and perso… A3349767W 1988 Aug 2.9
6 Northern Territory Clothing, footwear and perso… A3349767W 1988 Sep 3
1. Use an STL decompostition to calculate the trend-cycle and seasonal indices. If you deem it necessary, adjust the trend and season windows (solved)
NOTE: remember that the STL decomposition is only applicable to additive schemes, which means you need to transform the data first to obtain an additive scheme. To pick the best transformation, I am going to use the guerrero feature, something we will see in the following session. For now take this analysis as a given (I leave the analysis so that you understand it later):
retail_series %>% autoplot()Plot variable not specified, automatically selected `.vars = Turnover`
# let us examine the guerrero feature to see which box-cox transformation would
# be more appropriate.
lambda <- retail_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda[1] 0.08303631
This menas that a log transformation should be applied. In this particular case we are not going to adjust the STL windows. We are going to specify the log transformation within the model formula. We will see later that this is important.
dcmp <- retail_series %>%
model(stl = STL(log(Turnover))) %>%
components()
dcmp %>% autoplot()Now, having this decomposition, proceed to carry out the rest of the exercise.